Using Hints to Improve Inline Block-layer Deduplication
نویسندگان
چکیده
Block-layer data deduplication allows file systems and applications to reap the benefits of deduplication without requiring per-system or per-application modifications. However, important information about data context (e.g., data vs. metadata writes) is lost at the block layer. Passing such context to the block layer can help improve deduplication performance and reliability. We implemented a hinting interface in an open-source block-layer deduplication system, dmdedup, that passes relevant context to the block layer, and evaluated two hints, NODEDUP and PREFETCH. To allow upper storage layers to pass hints based on the available context, we modified the VFS and file system layers to expose a hinting interface to user applications. We show that passing the NODEDUP hint speeds up applications by up to 5.3× on modern machines because the overhead of deduplication is avoided when it is unlikely to be beneficial. We also show that the PREFETCH hint accelerates applications up to 1.8× by caching hashes for data that is likely to be accessed soon.
منابع مشابه
Design and Implementation of an Open-Source Deduplication Platform for Research A RESEARCH PROFICIENCY EXAM PRESENTED BY
Data deduplication is a technique used to improve storage utilization by eliminating duplicate data. Duplicate data blocks are not stored and instead a reference to the original data block is updated. Unique data chunks are identified using techniques such as hashing, and an index of all the existing chunks is maintained. When new data blocks are written, they are hashed and compared to the has...
متن کاملSparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality
© Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, Peter Camble
متن کاملHPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud
Eliminating duplicate data in primary storage of clouds increases the cost-efficiency of cloud service providers as well as reduces the cost of users for using cloud services. Most existing primary deduplication techniques either use inline caching to exploit locality in primary workloads or use postprocessing deduplication running in system idle time to avoid the negative impact on I/O perform...
متن کاملA Context Aware Block Layer: The Case for Block Layer Deduplication
of the Thesis A Context Aware Block Layer: The Case for Block Layer Deduplication
متن کاملChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory
Storage deduplication has received recent interest in the research community. In scenarios where the backup process has to complete within short time windows, inline deduplication can help to achieve higher backup throughput. In such systems, the method of identifying duplicate data, using disk-based indexes on chunk hashes, can create throughput bottlenecks due to disk I/Os involved in index l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016